Revisiting bounded context block-sorting transformations

نویسندگان

  • J. Shane Culpepper
  • Matthias Petri
  • Simon J. Puglisi
چکیده

The Burrows-Wheeler Transform (bwt) produces a permutation of a string X, denoted X∗, by sorting the n cyclic rotations of X into full lexicographical order, and taking the last column of the resulting n× n matrix to be X∗. The transformation is reversible in O(n) time. In this paper, we consider an alteration to the process, called k-bwt, where rotations are only sorted to a depth k. We propose new approaches to the forward and reverse transform, and show the methods are efficient in practice. More than a decade ago, two algorithms were independently discovered for reversing k-bwt, both of which run in O(nk) time. Two recent algorithms have lowered the bounds for the reverse transformation to O(n log k) and O(n) respectively. We examine the practical performance for these reversal algorithms. We find the original O(nk) approach is most efficient in practice, and investigate new approaches, aimed at further speeding reversal, which store precomputed context boundaries in the compressed file. By explicitly encoding the context boundaries, we present an O(n) reversal technique that is both efficient and effective. Finally, our study elucidates an inherently cache-friendly – and hitherto unobserved – behaviour in the reverse k-bwt, which could lead to new applications of the k-bwt transform. In contrast to previous empirical studies, we show the partial transform can be reversed significantly faster than the full transform, without significantly affecting compression effectiveness. Copyright c © 0000 John Wiley & Sons, Ltd.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On variants of block-sorting compression using context from both the left and right

The block-sorting text compression algorithm can be viewed as associating a context with each character to be compressed, and then sorting the characters on their contexts. Normally, the context associated with each character is the string to the left of the character. Recently, Ratushnyak suggested that it might be possible instead to build a context by interleaving characters taken alternatel...

متن کامل

On variants of block-sorting compression using context from both the

The block-sorting text compression algorithm can be viewed as associating a context with each character to be compressed, and then sorting the characters on their contexts. Normally, the context associated with each character is the string to the left of the character. Recently, Ratushnyak suggested that it might be possible instead to build a context by interleaving characters taken alternatel...

متن کامل

Lossless and Near-Lossless Compression of Ecg Signals with Block-Sorting Techniques

In this work, we investigate the lossless and near-lossless compression of electrocardiogram (ECG) signals with different block-sorting transformations. We show that transformations with smaller context depths are a better choice for ECG signal compression when speed and memory utilization are considered. Further, we show that compression results of our proposed technique is better than other w...

متن کامل

Enhanced Word-Based Block-Sorting Text Compression

The Block Sorting process of Burrows and Wheeler can be applied to any sequence in which symbols are (or might be) conditioned upon each other. In particular, it is possible to parse text into a stream of words, and then employ block sorting to identify and so exploit any conditioning relationships between words. In this paper we build upon the previous work of two of the authors, describing se...

متن کامل

Text Compression using Recency Rank with Context and Relation to Context Sorting, Block Sorting and PPM*

Recently block sorting compression scheme was developed and relation to statistical scheme was studied, but theoretical analysis of performance has not been studied well. Context sorting is a compression scheme based on context similarity and it is regarded as an online version of the block sorting and it is asymptotically optimal. However, the compression speed is slower and the real performan...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Softw., Pract. Exper.

دوره 42  شماره 

صفحات  -

تاریخ انتشار 2012